Network data congestion management probe system

ABSTRACT

A system to investigate congestion in a computer network may include network devices to route data packets throughout the network. The system may also include a source node that sends a probe packet to the network devices to gather information about the traffic queues at each network device that receives the probe packet. The system may further include a routing table at each examined network device that is based upon the gathered information for each respective traffic queue.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to the field of computer systems, and, moreparticularly, to address data congestion and management of such.

2. Description of Background

Generally, conventional Ethernet fabrics are dynamically routed. Inother words, packets are directed from one switch node to the next, hopby hop, through the network. Examples of protocols used includeConverged Enhanced Ethernet (CEE), Fibre Channel over Converged EnhancedEthernet (FCoCEE), and Data Center Bridging (DCB), as well asproprietary routing schemes.

SUMMARY OF THE INVENTION

According to one embodiment of the invention, a system to investigatecongestion in a computer network may include network devices to routedata packets throughout the network. The system may also include asource node that sends a probe packet to the network devices to gatherinformation about the traffic queues at each network device thatreceives the probe packet. The system may further include a routingtable at each network device that receives the probe packet, and therouting table is based upon the gathered information for each respectivetraffic queue.

The network devices may be members of at least one virtual local areanetwork. The probe packets may include a layer 2 flag and/orsequence/flow/source node IDs.

Each network device may ignore the probe packet if it is busy. At leastone of the network devices may provide its extended queue status to thesource node in response to receiving the probe packet.

A network device may provide its extended queue status to other networkdevices in response to receiving the probe packet. The extended queuestatus includes the number of pings from any flow ID received since thelast queue change, the number of packets forwarded since the last queuechange, and/or pointers to a complete network device core dump.

If an extended queue status exceeds a threshold level, the source nodeupdates the routing table to rebalance traffic loads. The probe packetis sent in response to the source node receiving a threshold number ofcongestion notification messages in a given time interval.

Another aspect of the invention is a method to investigate congestion ina computer network that may include sending a probe packet to networkdevices from a source node to gather information about the trafficqueues at each network device that is examined by the probe packet. Themethod may also include basing a routing table at each network devicethat receives the probe packet on the gathered information forrespective each traffic queue.

The method may further include organizing the network devices into avirtual local area network. The method may additionally includestructuring the probe packets to include at least one of a layer 2 flagand sequence/flow/source node IDs.

The method may further include sending an extended queue status of atleast one of the network devices to the source node in response toreceiving the probe packet. The method may additionally includeproviding the extended queue status of a network device to other networkdevices in response to the network device receiving the probe packet.

The method may further comprise including at least one of the number ofpings from any flow ID received since the last queue change, the numberof packets forwarded since the last queue change, and pointers to acomplete network device core dump as part of the extended queue status.The method may additionally include updating the routing table torebalance traffic loads via the source node if the extended queue statusexceeds a threshold level.

Another aspect of the invention is a computer readable program codescoupled to tangible media to investigate congestion in a computernetwork. The computer readable program codes may be configured to causethe program to send a probe packet to network devices from a source nodeto gather information about the traffic queues at each network devicethat is examined by the probe packet. The computer readable programcodes may also base a routing table at each network device that receivesthe probe packet on the gathered information for respective each trafficqueue.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a system to investigatecongestion in a computer network in accordance with the invention.

FIG. 2 is a flowchart illustrating method aspects according to theinvention.

FIG. 3 is a flowchart illustrating method aspects according to themethod of FIG. 2.

FIG. 4 is a flowchart illustrating method aspects according to themethod of FIG. 2.

FIG. 5 is a flowchart illustrating method aspects according to themethod of FIG. 2.

FIG. 6 is a flowchart illustrating method aspects according to themethod of FIG. 2.

FIG. 7 is a flowchart illustrating method aspects according to themethod of FIG. 5.

FIG. 8 is a flowchart illustrating method aspects according to themethod of FIG. 2.

DETAILED DESCRIPTION OF THE INVENTION

The invention will now be described more fully hereinafter withreference to the accompanying drawings, in which preferred embodimentsof the invention are shown. Like numbers refer to like elementsthroughout, like numbers with letter suffixes are used to identifysimilar parts in a single embodiment, and letter suffix lower case n isa variable that indicates an unlimited number of similar elements.

With reference now to FIG. 1, a system 10 to investigate congestion in acomputer network 12 is initially described. The system 10 is aprogrammable apparatus that stores and manipulates data according to aninstruction set as will be appreciated by those of skill in the art.

In one embodiment, the system 10 includes a communications network(s)12, which enables a signal, e.g. data packet, probe packet, and/or thelike, to travel anywhere within, or outside of, system 10. Thecommunications network 12 is wired and/or wireless, for example. Thecommunications network 12 is local and/or global with respect to system10, for instance.

The system 10 includes network devices 14 a-14 n to route data packetsthroughout the network 12. The network devices 14 a-14 n are computernetwork equipment such as switches, network bridges, routers, and/or thelike. The network devices 14 a-14 n can be connected together in anyconfiguration to form the communications network 12, as will beappreciated by those of skill in the art.

The system 10 may further include a source node 16 that sends datapackets to any of the network devices 14 a-14 n. There can be any numberof source nodes 16 in the system 10. The source node 16 is any piece ofcomputer equipment that is able to send data packets to the networkdevices 14 a-14 n.

The system 10 can also include a routing table 18 a-18 n at eachrespective network device 14 a-14 n. In another embodiment, the routethe data packets are sent by any network device 14 a-14 n is based uponeach respective routing table 18 a-18 n.

The network devices 14 a-14 n can be members of at least one virtuallocal area network 20. The virtual local area network 20 permits thenetwork devices 14 a-14 n to be configured and/or reconfigured with lessregard for each network devices' 14 a-14 n physical characteristics assuch relates to the communications network's 12 topology, as will beappreciated by those of skill in the art. In another embodiment, thesource node 16 adds a header to the data packets in order to define thevirtual local area network 20.

In one embodiment, the source node 16 sends a probe packet(s) to thenetwork devices 14 a-14 n to gather information about the traffic queuesat each network device that receives the probe packet(s). The therouting table 18 a-18 n at each network device 14 a-14 n that receivesthe probe packet may be based upon the gathered information for eachrespective traffic queue.

In one embodiment, the network devices 14 a-14 n are members of at leastone virtual local area network 20. In another embodiment, the probepackets include a layer 2 flag and/or sequence/flow/source node IDs.

Each network device 14 a-14 n can ignore the probe packet if it is busy.In one configuration, at least one of the network devices 14 a-14 nprovides its extended queue status to the source node 16 in response toreceiving the probe packet.

One of the network devices 14 a-14 n may provide its extended queuestatus to other network devices in response to receiving the probepacket. The extended queue status can include the number of pings fromany flow ID received since the last queue change, the number of packetsforwarded since the last queue change, and/or pointers to a completenetwork device core dump.

In one embodiment, if an extended queue status exceeds a thresholdlevel, the source node 16 updates the routing table 18 a-18 n torebalance traffic loads within the VLAN 20. The probe packet can be sentin response to the source node 16 receiving a threshold number ofcongestion notification messages in a given time interval.

In one embodiment, the system 10 additionally includes a destinationnode 22 that works together with the source node 16 to determine theroute the data packets follow through network 12. There can be anynumber of destination nodes 22 in system 10.

The source node 16 may be configured to collect congestion notificationmessages from the network devices 14 a-14 n, and map the collectedcongestion notification messages to the network topology. The system 10may also include a filter 24 that controls which portions of thecongestion notification messages from the network devices 14 a-14 n areused by the source node 16. Thus, the source node 16 can route aroundany network device 14 a-14 n for which the collected congestionnotification messages reveal a history of congestion.

In one embodiment, the source node 16 routes to, or around, any networkdevice 14 a-14 n based upon a link cost indicator 26. The system 10 canfurther include a destination node 22 that selects the order of theroutes.

Another aspect of the invention is a method to investigate congestion ina computer network 12, which is now described with reference toflowchart 30 of FIG. 2. The method begins at Block 32 and may includesending a probe packet to network devices from a source node to gatherinformation about the traffic queues at each network device that isexamined by the probe packet at Block 34. The method may also includebasing a routing table at each network device that receives the probepacket on the gathered information for respective each traffic queue atBlock 36. The method ends at Block 38.

In another method embodiment, which is now described with reference toflowchart 40 of FIG. 3, the method begins at Block 42. The method mayinclude the steps of FIG. 2 at Blocks 34 and 36. The method mayadditionally include organizing the network devices into a virtual localarea network at Block 44. The method ends at Block 46.

In another method embodiment, which is now described with reference toflowchart 48 of FIG. 4, the method begins at Block 50. The method mayinclude the steps of FIG. 2 at Blocks 34 and 36. The method mayadditionally include structuring the probe packets to include at leastone of a layer 2 flag and sequence/flow/source node IDs at Block 52. Themethod ends at Block 54.

In another method embodiment, which is now described with reference toflowchart 56 of FIG. 5, the method begins at Block 58. The method mayinclude the steps of FIG. 2 at Blocks 34 and 36. The method mayadditionally include sending an extended queue status of at least one ofthe network devices to the source node in response to receiving theprobe packet at Block 60. The method ends at Block 62.

In another method embodiment, which is now described with reference toflowchart 64 of FIG. 6, the method begins at Block 66. The method mayinclude the steps of FIG. 2 at Blocks 34 and 36. The method mayadditionally include providing the extended queue status of a networkdevice to other network devices in response to the network devicereceiving the probe packet at Block 68. The method ends at Block 70.

In another method embodiment, which is now described with reference toflowchart 72 of FIG. 7, the method begins at Block 74. The method mayinclude the steps of FIG. 5 at Blocks 34, 36, and 60. The method mayadditionally comprise including at least one of the number of pings fromany flow ID received since the last queue change, the number of packetsforwarded since the last queue change, and/or pointers to a completenetwork device core dump as part of the extended queue status at Block76. The method ends at Block 78.

In another method embodiment, which is now described with reference toflowchart 80 of FIG. 8, the method begins at Block 82. The method mayinclude the steps of FIG. 5 at Blocks 34, 36, and 60. The method mayadditionally include updating the routing table to rebalance trafficloads via the source node if the extended queue status exceeds athreshold level at Block 84. The method ends at Block 86.

In view of the foregoing, the system 10 addresses the investigation ofcongestion in computer networks 12. For example, large convergednetworks are prone to congestion and poor performance because theycannot sense and react to potential congestion conditions. System 10provides a proactive scheme for probing network congestion points,identifying potential congestion areas in advance, and/or preventingthem from forming by rerouting traffic along different paths.

In other words, system 10 uses proactive source-based routing, whichincorporates an active feedback request command that takes snapshots ofthe state of the network and uses this information to prevent congestionor other traffic flow problems before they occur. In this approach thesource node 16, e.g. traffic source, actively monitors the end-to-endtraffic flows by inserting a probing packet, called “feedback request”,into the data stream at periodic intervals. This probe packet willtraverse the network 12 (the VLANs plus any alternative paths) andcollect information on the traffic queue loads.

In one embodiment, system 10 does not require congestion notificationmessages (“CNM”) in order to work. In another embodiment, in a networkwhich uses CNMs, the probing can also be triggered by a source havingreceived more than a certain number of CNMs in a given time interval.

A related problem in converged networks is the monitoring and control ofadaptive routing fabrics. Most industry standard switches are compliantwith IEEE 802.1Qau routing mechanisms. However, they fail to offer ameans for delivering adaptive feedback information to the trafficsources before congestion arises in the network.

System 10 addresses the foregoing and greatly enhances the speed ofcongestion feedback on layer 2 networks, and provides the new functionof anticipating probable congestion points before they occur.

In one embodiment, the source node 16 autonomously issues a feedbackrequest command. In another embodiment, the source node 16 begins toissue feedback request after receiving a set number of congestionnotification messages, e.g. as defined in Quantized CongestionNotification (QCN). When feedback requests are returned, system 10 caneither count the number of responses per flow ID (stateful approach) orallow the responses to remain anonymous (stateless approach).

In one embodiment, the source node 16 injects a feedback request packetinto the network 12 with a layer 2 flag and sequence/flow/RP IDs. Thenetwork device 14 a-14 n receives the feedback request. If the networkdevice 14 a-14 n is busy it may disregard the request, if not, itincrements a counter indicating that a feedback request packet has beenreceived. The network device 14 a-14 n then dumps its extended queuestatus information and returns this data back to the source node 16 thatoriginated the feedback request packet. In another embodiment, thenetwork device 14 a-14 n may also be set to forward the feedback requestto other nodes in the network 12.

In one embodiment, the extended queue status may include the number ofpings from any flow ID received since the last queue change, the numberof packets forwarded since the last queue change, and/or the pointers toa complete CP core dump.

In one embodiment, the feedback requests may be triggered by QCN frames,so that any rate-limiting traffic flows are probed. For example, system10 might send one feedback request frame for every N kilobytes of datasent per flow (e.g. N=750 kB). This provides the potential for earlyresponse to pending congestion points. Using the information obtainedfrom feedback requests, source adaptive routing may be employed to stopnetwork 12 congestion before it happens. This information also makes itpossible to optimize traffic flows according to latency, throughput, orother user requirements.

In one embodiment, system 10 uses QCN messaging on a converged network.The detailed queue information is already available in the networkdevice 14 a-14 n, e.g. switch CP, but it needs to be formatted andcollected by the source node 16.

The performance overhead has been demonstrated to be less than 1% forfeedback request monitoring in software. The overhead limits can befurther reduced if desired by allowing the source node 16 and networkdevices 14 a-14 n to adjust the frequency of feedback control requests.This approach is further enhanced the value of enabled switch fabricsand allows for more effective use of source based adaptive routing.

In one embodiment, in a CEE/FCoE network 12 having a plurality of VLANs20 each having a plurality of network devices 14 a-14 n, e.g. switches,which enable paths over which traffic can be routed through the network,a method for locating potential congestion points in the network isdescribed.

As noted above, large converged networks do not define adequate means tocontrol network congestion, leading to traffic delays, dropped dataframes, and poor performance. The conventional hop-by-hop routing is notefficient at dealing with network congestion, especially when acombination of storage and networking traffic is placed over a commonnetwork, resulting in new and poorly characterized traffic statistics.If the benefits of converged networking are to be realized, a new methodof traffic routing is required. To address such, system 10 uses a sourcebased, reactive, and adaptive routing scheme.

In one embodiment, system 10 adds a virtual LAN (VLAN) 20 routing table18 a-18 n in every network device 14 a-14 n, e.g. switches. The VLAN 20is defined by a 12 bit header field appended to all packets (hence thisis a source-based routing scheme), plus a set of routing table 18 a-18 nentries (in all the switches) that can route the VLANs.

The 12 bit VLAN 20 ID is in addition to the usual packet header fields,and it triggers the new VLAN 20 routing scheme in each network device 14a-14 n. Each network device 14 a-14 n has its own routing entry forevery active VLAN 20.

In one embodiment, source node 16 and destination node 22 use a globalselection function to decide the optimal end-to-end path for the trafficflows. The optimal end-to-end path is then pre-loaded into the networkdevices 14 a-14 n, e.g. switches, which are members of this VLAN 20.

In one embodiment, the VLAN 20 table 18 a-18 n is adaptive and will beperiodically updated. The refresh time of the routing table 18 a-18 ncan be varied, but will probably be at least a few seconds for areasonably large number (4,000 or so) of VLANs 20. The data trafficsubject to optimization will use the VLANs 20 as configured by thecontrolling sources/applications 16.

In one embodiment, congestion notification messages (CNMs) from thenetwork devices 14 a-14 n, e.g. fabric switches, are collected by thetraffic source 16, marking the switch and port locations based on theport ID. Every traffic source 16 builds a history of CNMs that it hasreceived, which is mapped to the network topology. Based on the source's16 historical mapping of global end-to-end paths, the source willreconfigure any overloaded paths, defined by the VLAN 20 tables 18 a-18n, to route around the most persistent congestion points (signaled bythe enabled switches).

In one embodiment, for each destination, the source 16 knows all thepossible paths a packet can take. The source 16 can then evaluate thecongestion level along each of these paths and choose the one with thesmallest cost, and therefore the method is adaptive.

In another embodiment, the order in which the paths are selected isgiven by the destination 22. In the case that no CNMs are received, thesource 16 will default to the same path used by conventional andoblivious methods.

In one embodiment, if the default path is congested, the alternativepaths are checked next (by comparing their congestion cost), startingwith the default one, in a circular search, until a non-congested pathis found. Otherwise, the first path with the minimum congestion cost ischosen.

In another embodiment, the CNMs are used as link cost indicators 26.System 10 defines both a global and local method of cost weighting, plusa filtering scheme to enhance performance.

In this manner, system 10 can determine where the most congested linksare located in the network 12. For each destination 22, the source 16knows all the possible paths a packet can take. The source 16 can thenevaluate the congestion level along each of these paths and choose theone with the smallest cost and therefore the method is adaptive.

In one embodiment, the system 10 uses at least one of two differentmethods of computing the path cost. The first is a global price, whichis the (weighted) sum of the congestions levels on each link of thepath. The other is the local price, which is the maximum (weighted)congestion level of a link of the path.

The intuition behind the local price method is that a path where asingle link experiences heavy congestion is worse than a path wheremultiple links experience mild congestion. On the other hand, a pathwith two heavily congested links is worse than a path with a singleheavily congested link.

The intuition behind using a global price method is that the CNMsreceived from distant network devices 14 a-14 n, e.g. switches, are moreinformative than those received from switches that are close to thesource 16. This happens because the congestion appears on the links thatare likely to concentrate more flows (i.e. the links that are fartheraway from the source).

In one embodiment, to avoid high frequency noise which could lead toinstabilities in the network devices 14 a-14 n, e.g. switch, updatingprocess, the system 10 applies filter 24 to the incoming stream of CNMs.The filter 24 is a low pass filter, for example. The filter 24 wouldhave a running time window to average and smooth the CNM stream.

In one embodiment, periodically the source 16 will refresh, and ifnecessary, update the VLAN 20 path information in the affected networkdevices 14 a-14 n. In another embodiment, the optimal path routing iscalculated by the end points, e.g. source 16 and destination 22, andrefreshed periodically throughout the switch fabric.

The system 10 can be implemented in hardware, software, and/or firmware.Another aspect of the invention is a computer readable program codescoupled to tangible media to investigate congestion in a computernetwork 12. The computer readable program codes may be configured tocause the program to send a probe packet to network devices 14 a-14 nfrom a source node 16 to gather information about the traffic queues ateach network device that is examined by the probe packet. The computerreadable program codes may also base a routing table 18 a-18 n at eachnetwork device 14 a-14 n that receives the probe packet on the gatheredinformation for respective each traffic queue.

As will be appreciated by one skilled in the art, aspects of theinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the invention may take the form of anentirely hardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,aspects of the invention may take the form of a computer program productembodied in one or more computer readable medium(s) having computerreadable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the invention are described above with reference to flowchartillustrations and/or block diagrams of methods, apparatus (systems) andcomputer program products according to embodiments of the invention. Itwill be understood that each block of the flowchart illustrations and/orblock diagrams, and combinations of blocks in the flowchartillustrations and/or block diagrams, can be implemented by computerprogram instructions. These computer program instructions may beprovided to a processor of a general purpose computer, special purposecomputer, or other programmable data processing apparatus to produce amachine, such that the instructions, which execute via the processor ofthe computer or other programmable data processing apparatus, createmeans for implementing the functions/acts specified in the flowchartand/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

While the preferred embodiment to the invention has been described, itwill be understood that those skilled in the art, both now and in thefuture, may make various improvements and enhancements which fall withinthe scope of the claims which follow. These claims should be construedto maintain the proper protection for the invention first described.

1. A system comprising: network devices to route data packets throughouta network; a source node that sends a probe packet to the networkdevices to gather information about the traffic queues at each networkdevice that receives the probe packet; and a routing table at eachnetwork device that receives the probe packet, and the routing table isbased upon the gathered information for each respective traffic queue.2. The system of claim 1, wherein the network devices are members of atleast one virtual local area network.
 3. The system of claim 1, whereinthe probe packets include at least one of a layer 2 flag andsequence/flow/source node IDs.
 4. The system of claim 1, wherein eachnetwork device can ignore the probe packet if it is busy.
 5. The systemof claim 1, wherein at least one of the network devices provide itsextended queue status to the source node in response to receiving theprobe packet.
 6. The system of claim 1, wherein a network deviceprovides its extended queue status to other network devices in responseto receiving the probe packet.
 7. The system of claim 5, wherein theextended queue status includes at least one of the number of pings fromany flow ID received since the last queue change, the number of packetsforwarded since the last queue change, and pointers to a completenetwork device core dump.
 8. The system of claim 5, wherein if anextended queue status exceeds a threshold level, the source node updatesthe routing table to rebalance traffic loads.
 9. The system of claim 1,wherein the probe packet is sent in response to the source nodereceiving a threshold number of congestion notification messages in agiven time interval.
 10. A method comprising: sending a probe packet tonetwork devices from a source node to gather information about thetraffic queues at each network device that is examined by the probepacket; and basing a routing table at each network device that receivesthe probe packet on the gathered information for respective each trafficqueue.
 11. The method of claim 10, further comprising organizing thenetwork devices into a virtual local area network.
 12. The method ofclaim 10, further comprising structuring the probe packets to include atleast one of a layer 2 flag and sequence/flow/source node IDs.
 13. Themethod of claim 10, further comprising sending an extended queue statusof at least one of the network devices to the source node in response toreceiving the probe packet.
 14. The method of claim 10, furthercomprising providing an extended queue status of a network device toother network devices in response to the network device receiving theprobe packet.
 15. The method of claim 13, further comprising includingat least one of the number of pings from any flow ID received since thelast queue change, the number of packets forwarded since the last queuechange, and pointers to a complete network device core dump as part ofthe extended queue status.
 16. The method of claim 13, furthercomprising updating the routing table to rebalance traffic loads via thesource node if the extended queue status exceeds a threshold level. 17.A computer program product embodied in a tangible media comprising:computer readable program codes coupled to the tangible media toinvestigate congestion in a computer network, the computer readableprogram codes configured to cause the program to: send a probe packet tonetwork devices from a source node to gather information about thetraffic queues at each network device that is examined by the probepacket; and base a routing table at each network device that receivesthe probe packet on the gathered information for respective each trafficqueue.
 18. The computer program product of claim 17, further comprisingprogram code configured to at least one of: organize the network devicesinto a virtual local area network; and structure the probe packets toinclude at least one of a layer 2 flag and sequence/flow/source nodeIDs.
 19. The computer program product of claim 17, further comprisingprogram code configured to: send an extended queue status of at leastone of the network devices to the source node in response to receivingthe probe packet.
 20. The computer program product of claim 19, furthercomprising program code configured to at least one of: provide anextended queue status of a network device to other network devices inresponse to the network device receiving the probe packet; include atleast one of the number of pings from any flow ID received since thelast queue change, the number of packets forwarded since the last queuechange, and pointers to a complete network device core dump as part ofthe extended queue status; and update the routing table to rebalancetraffic loads via the source node if the extended queue status exceeds athreshold level.